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Abstract Eukaryotic DNA is packaged into chromatin: one-dimensional arrays of nucleosomes sepa- 
rated by stretches of linker DNA are folded into 30-nm chromatin fibers which in turn form higher-order 
structures Pj. Each nucleosome, the fundamental unit of chromatin, has 147 base pairs (bp) of DNA 
wrapped around a histone octamer j2] . In order to describe how chromatin fiber formation affects nu- 
cleosome positioning and energetics, we have developed a thermodynamic model of finite-size particles 
with effective nearest-neighbor interactions and arbitrary DNA-binding energies. We show that both 
one- and two-body interactions can be accurately extracted from one-particle density profiles based 
on high-throughput maps of in vitro or in vivo nucleosome positions. Although a simpler approach 
that neglects two-body interactions (even if they are in fact present in the system) can be used to 
predict sequence determinants of nucleosome positions, the full theory is required to disentangle one- 
and two-body effects. Finally, we construct a minimal model in which nucleosomes are positioned by 
steric exclusion and two-body interactions rather than intrinsic histone-DNA sequence preferences. 
The model accurately reproduces nucleosome occupancy patterns observed over transcribed regions in 
living cells. 

Keywords Chromatin structure • Nucleosome positioning • One-dimensional classical fluid of 
interacting particles 
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1 Introduction 

Eukaryotic DNA in packaged into nucleosomes. Each nucleosome consists of a 147 bp-long DNA seg- 
ment wrapped around a histone octamer in ^ 1.7 turns of a left-handed superhelix In addition to its 
primary function of DNA compaction, chromatin modulates DNA accessibility to transcription factors 
and other molecular machines, affecting gene transcription as well as DNA repair, maintenance, and 
replication. In S. cerevisiae, arrays of nucleosomes cover 80% of genomic DNA, exerting profound 
influence on gene regulatory programs |3]l4jl5l[6] . 

Because the free energy of bending a DNA segment into a superhelix will vary depending on its 
nucleotide sequence and composition 011], nucleosomes exhibit a range of in vitro formation ener- 
gies [^ITU] (although almost any DNA sequence can be packaged into a nucleosome). Recent work 
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Fig. 1 (Color online) A typical nucleosome configuration near the transcr ipti on start site (TSS). The blue 
background represents in vivo nucleosome occupancy from Zawadzki et al. [21], averaged over all genes. The 
occupancy profile is normalized by its average in the [—500, 500] bp window. The intensity of the blue color is 
proportional to the degree of nucleosome localization. 



has clarified the role of sequence rules that influence nucleosome positioning: genome-wide in vitro 
reconstitution experiments have confirmed that nucleosome architecture over promoters and genes is 
partially established by DNA sequence, mostly as a result of nucleosome depletion from A/T-rich, 
nucleosome-disfavoring sequences on both ends of the transcript [11''12''13]. Hovifever, even if genomic 
DNA from S. cerevisiae is mixed with histones in a 1:1 mass ratio (leading to the maximum nucleosome 
occupancy of 0.82 which is close to the in vivo value |13j). nucleosomes are not strongly localized and, 
on average, nucleosome occupancy is just ^ 20 — 30% lower over nucleosome-depleted regions (NDRs) 
compared to the mean occupancy in a window which includes both the coding region and adjacent 
sequence. The absence of nucleosome localization in vitro and shallow NDRs indicate that the absolute 
magnitude of intrinsic histone-DNA interactions is less than 1 kgT. 

In vivo, 5' and 3' NDRs flanking the transcript are much more pronounced (^ 60 — 70% occu- 
pancy depletion on average with respect to the mean [l2l[T4llT5l[T6] ) . establishing a striking pattern 
of nucleosome localization over genie regions simply due to steric exclusion which causes nucleosomes 
to "phase off" potential barriers [T7] (Fig. [T]). Although the exact nature of these in vivo barriers is 
unknown and may vary between cell types and environmental conditions, they are likely established 
through a combined action of RNA polymerase, ATP-dependent chromatin remodeling enzymes and 
DNA-binding proteins HHUHlin]. 

Nucleosome positions and formation energies can be predicted using a thermodynamic model which 
takes intrinsic histone-DNA sequence preferences and one-dimensional steric exclusion into account [22] . 
In this approach, sequence determinants of nucleosome energetics are inferred directly from experimen- 
tally available nucleosome occupancy profiles. The profiles are obtained by isolating and sequencing 
mononucleosomal DNA on a large scale, followed by mapping nucleosomal sequence reads to the refer- 
ence genome [23 . However, structural regularity of the chromatin fiber imposes additional constraints 
on nucleosome positions |24p25j: linkers between neighboring nucleosomes become preferentially dis- 
cretized with the 10 — 11 bp periodicity of DNA helical twist The discretization is required to 
avoid steric clashes caused by the nucleosome rotating with respect to the linker DNA axis as the 
linker increases in length [24] . and more generally to maintain a regular pattern of protein-protein 
and protein-DNA contacts in the chromatin fiber [25]. Indeed, adding a short DNA segment to the 
linker will rotate the nucleosome with respect to the rest of the fiber, causing disruption of its periodic 
structure. The disruption is minimized if the length of the extra segment is a multiple of 10 — 11 bp, 
which brings the nucleosome into an equivalent rotational position. 

We have recently developed a rigorous approach in which linker length discretization is described by 
nearest-neighbor two-body interactions in a system of non-overlapping finite-size particles [27] (Fig. [2]). 
We have shown that it is possible to infer one-body energies given by intrinsic histone-DNA interactions 
simultaneously with two-body energies caused by chromatin fiber formation. The two-body potential 
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Fig. 2 (Color online) A typical configuration of 6 nucleosomes. Each nucleosome covers DNA sequence (rep- 
resented by colored boxes) which gives the one-body energy of the nucleosome, u. The one-body energy is 
represented by gray bars, f^^or simplicity, in this toy model, the one-body energy is entirely determined by the 
base pair located at the starting position of the nucleosome. In more realistic scenarios one-body energy is a 
function of the entire sequence occupied by the nucleosome. The two-body interaction (l>{i,j) acts only between 
neighboring nucleosomes, with two indices i and j representing their starting positions. 



can be deduced even in the presence of one-body energies related to rotational positioning of the 
nucleosome [T6 l [23 l fT3 ] . which have the same 10—11 bp helical twist periodicity. We have predicted the 
two-body interaction from high-throughput maps of nucleosome positions on the S. cerevisiae genome, 
and demonstrated its essential role in forming nucleosome occupancy patterns over genie regions. 

Here we present a detailed account of our theoretical framework. We also show that sequence 
determinants of nucleosome positioning can be obtained with an interaction- free model |22j . even if 
in reality the two-body potential and histone-DNA interactions have comparable magnitudes. To this 
end, we develop a minimally constrained sequence-specific model of nucleosome energetics in which the 
same energies are assigned to mono- and dinucleotides regardless of their exact position within the 147 
bp nucleosomal site [22] . However, only by taking into account the two-body interaction can we make 
a clear distinction between the two types of energies which dictate nucleosome positioning. Finally, we 
build a minimal model in which in vivo nucleosomes are positioned solely by potential barriers located 
at each end of the transcript. Without invoking explicit sequence specificity, the model successfully 
reproduces nucleosome occupancy patterns observed in vivo in S. cerevisiae. In contrast, sequence- 
dependent models can only capture liquid-like, delocalized behavior observed with in vitro nucleosomes 
[12. 13J. By combining the minimal model with sequence-specific nucleosome energies, we estimate that 
intrinsic histone-DNA interactions contribute < 30% to the height of the in vivo potential barriers. 



2 Theory 



2.1 Energetics of one-dimensional hard rods with nearest-neighbor interactions 



We consider the problem of interacting hard rods of length a — 147 bp confined to a one-dimensional 
lattice of length L bp (the length of the DNA segment) (Fig. [2]). Let u{k) be the external potential 
energy of a particle that occupies positions k through fc -I- a — 1 on the DNA (the one-body energy), 
and let 'P{k, I) be the two-body interaction between a pair of nearest-neighbor particles with starting 
positions k and I, respectively. Here u{k) describes intrinsic histone-DNA interactions, while ^(fc, I) 
accounts for the effects of chromatin structure. We assume that the DNA segment is surrounded by 
impenetrable walls, so that 

m(0) = u{L - a + 2) = u{L - a + 3) = . . . = u{L) = oo. 



Moreover, particle overlaps are not allowed and the two-body potential is short-range as in the Taka- 
hashi hard- rod model ^ ^ 

oo if / < fc -|- a, 
2a. 



,^ f OO if / < fc - 



The canonical partition function for a fixed number of particles N is given by 

Qj^ = ^ g-/3u(il)g-/3#(ii,i2)g-^M(i2) _ g-/9«(ijv-i)g-0<l>(ijv-l,»N)g-^u(i]v)^ ^J^-) 
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where f3 — l/kgT is the inverse temperature. 

Let us introduce two Imax x ^max matrices, where /,„ax = i — a + 1 is the rightmost starting position 
of a particle of length a: 

^ ' ' ' 1 otherwise, 
(fc|e|0 = e-'^"('=)4,/. 

Here 6kj is the Kronecker delta symbol, and {k\M\l) represents the element of matrix M in row k 
and column I (in Dirac notation). \l) is a column vector of dimension Zmax with 1 at position I and 
everywhere else, and {k\ is a row vector with 1 at position k. 

Defining | J) = Yl\=T 10 vector with 1 at every position), we rewrite Eq. ([T]) as 

j {J\{ew)^-'e\J) if7V>l, 
ifiV^O. 

The grand-canonical partition function is then given by 

z = J2 ^'''"'Qn = i + {J\ii -zwy'^lJ), (2) 

where /i is the chemical potential, A'n^ax = ["^J is the maximum number of particles that can fit on 
L bp, / is the identity matrix, and {k\z\l) = e'^[''~"('^'l(5fc,i. 
The s-particle distribution functions are defined as 

ns(zi, ...,is) = - 



nit) = ^{J\{I - zw)-'\t)mz\t){mi ~ wz)-'\J), (3) 



where ^{i) = e'^I^-"^*)! (see the chapter by Stell m^\). The one-particle distribution function is 

1 
Z 

and the two-particle, nearest-neighbor distribution function is 

Mhj) = - zwy'\i){i\zwz\j){j\il ~wz)-'\J). (4) 

These relations are easy to understand. To find the probability of starting a particle at position 
i [Eq. ([3|], we have to add the statistical weights of all configurations that contain a particle at that 
position, and divide the resulting sum by the partition function. Similarly, to find the probability of 
having a pair of nearest-neighbor particles with the starting positions i and j [Eq. Q] , we need to sum 
the statistical weights of all configurations that contain that pair of particles. 

Note that for short distances j ~ i < 2a, n2{i,j) is identical to the unrestricted two-particle 
distribution n2(i, j), because there is not enough space to put another particle between the two particles 
at i and j. In this paper we restrict ourselves to j — i < 2a, since we are interested in short-range 
interactions between nearest-neighbor pairs of particles. 

In many cases of interest the energetics of the system is unknown but the s-particlc distributions 
are available from experiment. Therefore, we wish to find the unknown energies u and (p from n and 
n2 by inverting Eqs. ([3| and (|4|. Let us define two matrices: {i\N\j) — n{i)Sij and (i|A''2|j) = n2{i,j)- 
Using these matrices and Eq.T2]), we have 

{J\{I ~ N2N-^)N\J) = ^{J\{I-zyj)-^z\J) = 

so that ^ 

1-{J\{I-N2N-^)N\J)- 
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After some matrix multiplications we obtain the following two equalities: 
and 

Substituting Eqs. ([5]), (|6| and ([7| into Eqs. (|3| and Q, we derive the exact expressions for one-body 
energies and two-body interactions [501151] : 

^ V 1- (j|(/-iV2iv-i)Ar|j) y ' ^' 

- mk I) - In i - ^-^K^ - ^ 

^ \ {k\I-N-^N2\J){J\I^N2N-^l) )■ 

If the two-body interactions other than the steric exclusion are neglected, then {k\w\l) 0{l~k—a), 
where 0{l ~k — a) is the Heaviside step function (1 if Z > fc a, and otherwise). In this case wc have: 



{J\{I ~ zwy^\i) = {J\I + zw + {zivf + . . .\i) ^Zl 
- wz)-^\J) = + wz + {wzf + ... |J> = Z\ 



where Zi and Zl are partial statistical sums which can be efficiently computed by iteration in a 
forward or reverse direction [51l22j. Note that Z = Z\ = z(^_^^-^. The partial statistical sums account 
for the contributions from all possible configurations of particles confined to the boxes [1, i] and [i,L], 
respectively. It can be shown that |22) : 



Z: 



f 



n 

3=1' 



l-0(j + l)+n(j + l) 



1 - 0{3) 

Oi j)+n{j) 
0{j) ' 



where 0{i) is the particle occupancy of bp i 0{i) = X]j=i-a+i ""(j) • 

Using Eq. ([3|, we reproduce the previous result from [52] which can be employed to find one-body 
energies from one-particle distribution in the case of hard-core interactions alone, 



-/3 -In 



n{i) 



1 - 0{i)+n{i) 



In 



i-\-a— 1 

n 



i-o(j) + »(j) 

1 - 0{j) 



(10) 



2.2 Predicting two-body interactions from one-particle distribution 

As shown above, there is a one-to-one correspondence between one-body energies u and two-body inter- 
actions on one hand, and particle distributions n and n2 on the other. Thus, if n and n2 are known, u 
and can be inferred exactly, and vice versa. However, in many situations the two-particle distribution 
is not directly available from experiments. For example, high-throughput nucleosome maps simultane- 
ously report nucleosome positions from many cells, effectively yielding a probabilistic description of the 
one-particle distribution n. Because of this averaging over single-cell configurations, information about 
the pair density profile n2 cannot be extracted directly. Nonetheless, if the two-body interactions are 
sufficiently strong, the one-particle distribution profile n can be used to obtain information about <f . 
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Let us introduce the dimensionless pair distribution 

n{i)n{]) 

Note that gii^j) — n2{i, j) /[n{i)n{j)] for short distances j ~ i < 2a, and that g{i,j) — g{j — i) in a 
homogeneous system. We start with a homogeneous system of N hard-rods which interact through an 
arbitrary nearest-neighbor potential (P, and then develop an approximation for the inhomogeneous case. 
In a translation-invariant continuous system with nearest-neighbor interactions of arbitrary strength, 
g-/3<p(d) _ Ce°"^g{d), where C and a are constants [^[5^1551134) . The result can also be proved for a 
lattice fluid of hard rods, as shown below. 

Proof Consider a system of N particles distributed on a segment of length L bp. We assume that the 
particles interact with each other through short-range nearest-neighbor interactions (which include 
steric exclusion if the particles have a finite size of a bp), and the total interaction energy is 

U{xi,X2, ■ . .,Xn) = 'P{X2 -Xi) + <P{X3 -X2) + ... + <PixN - XN-l) + Ub, 

where Ub is the boundary term which describes interaction between the walls and the first and last 
particles. 

For simplicity let us assume that the boundary conditions are enforced by two additional particles 
of the same kind fixed at 2: = and x = L, so that Ub = ^^(a;i) -I- <?(L — xn)- The exact form of 
boundary conditions is not essential in the thermodynamic limit. The canonical partition function of 
this system of N particles is 



E 



xjY— a;jv-l— xi—0 

Denoting f{x) = e~''*(^\ we obtain 

L Xm X2 

Qn{L) ■ ■ ■ £ /(^i - 0)/(x2 - . . . /(L - xn). 

a;7V— a;_/v-i— a;i— 

Note that this represents the convolution of + 1 functions /, 

Qjv(i) = (/*/*...*/)(i). 
^•^ V ■' 

N +1 functions 

The partition function can be computed using the z transform method. Let Q{z) be the z transform 
of Qn{L), 

OO 

Q(z) = ^QAr(n)z-". 

»i=0 

From the convolution theorem we have that 

Q{z) = \f(z) 

where F{z) is the z transform of /(n), 

OO 

The partition function can be recovered using the inverse z transform. 



Qn{L) = F{z) 



N+l 

z^-'z. 
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The contour of integration F is any simple closed curve enclosing \z\ = i?, \z\ > R being the region of 
convergence. 

Let us define h{z) = (iV + 1) InF(z) + (L - 1) Inz. With this notation 

1 

27ri jp 



'.N 



This integral can be computed by the saddle point method [34 . Expanding h{z) around the saddle 
point zq, we obtain 



1 

27ri 



gi/i"(zo)(z-zo)^2. 



Integration along the path of steepest descent yields a contribution from the Gaussian integral of 
order O {[h" {zq)]-^^'^) = 0{N-^/'^). Since we need In Qn(L) in order to compute the macroscopic 
quantities, and in the thermodynamic limit the terms of order O(lnA^) are not important, we can 
approximate the partition function as 



Qn{L) 

where zq is the saddle point, given by 

h 



L 



^F'izo) 



N 



N- 



0. 



(12) 



(13) 



We can compute the chemical potential for the interacting hard rods by taking the derivative of 
the free energy F with respect to the number of particles in the system 



OF ^ dlnQ^ . , 



dN 



dN 



(14) 



The pressure of the gas is given by the derivative of the free energy with respect to the length of 
the system. Let's denote the length of a base pair by 6, such that the real length of the system is Lb. 
We obtain 

IdF kBTd\nQN{L) kBT ^ . 
P=^T^ = ~ STT = ^— Inzo, (15) 



and from Eq. ( 12 ) we obtain 



h dL 



Qn{L) 



dL b 

N 



Fie' 



I3pb\ 



We use this result to compute the conditional probability of finding an adjacent particle at a 
distance d from the center of a fixed particle [5^(55] : 

P{d) = Prob(a;Ar = L — d\x]y+i = L) 
Qn^i{L - d) 



-m- 



iN 



For d < 2a the pair distribution becomes 

n2(i, i + d) 



9{d) 



pP{d) 



n{i)n{i + d) 

Thus in the homogeneous system the interaction between the particles has the form 

g-/3*(d) = Ce"'^g{d), 
where a = /3pb and C is a normalization constant. □ 



(16) 
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Fig. 3 (Color online) (a) g{i,j) = n2{i, j)/n{i)n{j) is plotted for a representative subset of all initial positions 
i in a 10* bp DNA segment. The one-body energies are randomly sampled from a Gaussian distribution with 
a mean of 2.5 ksT and a standard deviation of 0.2 fcsT, and 9 potential wells of depth 5 fesT are added at 
1, 2, . . . , 9 X 10'^ bp to model a strongly inhomogeneous system. n2{i,j) and n(i) are computed from one- and 
two-body energies using Eqs. (|3| and m. (b) PiinkGr(^), obtained by averaging g{i,j) over all initial positions 
i. Note that A — j — {i + 147) represents the linker length between the two nucleosomes with starting positions 
i and j, respectively, (c) Exact (solid blue line) and predicted (dotted black line) two-body in tera ctions. The 
predicted interaction was computed from the — In(Piinkcr) curve (dashed green line) using Eq. (17 1. 



In a more general case, the external potential breaks translational invariance, making g dependent 
on the absolute position of the first particle. However, if the two-body interaction (p is translationally 
invariant, a good approximation is provided by replacing g with Piin^orC^) — {gih i + a + A))i (averaged 
over all initial positions i) in Eq. (16) [27], 



p<P{i, j) w In [Piinkcr(j + a))] + a{j -i)+lnC 



(17) 



The constants C and a are uniquely determined by the asymptotic condition limQ_j-)_yoo (p^i^j) = 0. 
Eq. ([it]) provides an ansatz for reconstructing (p from Piinker(^) = (^2(*, i + a + A)/[n{i)n{i + A)])i. 

Fig.[3]shows a numerical test of this ansatz on a 10 kbp DNA segment. We construct a random one- 
body energy landscape and simulate strong inhomogeneity by positioning 9 potential wells with depth 
of Sfc^T at 1, 2, . . . , 9 kbp on the landscape. The model interaction between a pair of particles separated 
by a linker of length A is <?(/\) = 5 cos (f^j^le^'^^'^*' (in units of ksT). We use the one-body energies 
and the two-body potential as inputs to Eqs. ^ and ([ij, computing the dimensionless pair distribution 
function. The pair distribution varies significantly from bp to bp [Fig. [sj^a)], as can be expected in a 
system with one- and two-body energies of comparable magnitude. Following our prescription, we 
compute Piinkor by averaging over all the curves in Fig. |3]ja) [Fig. jsj^b)], and employ Eq. (17) to infer 
^ [Fig. |3][c)]. The correlation coefficient between predicted and exact two-body interactions is greater 
than 0.999. 

If a direct measurement of the pair distribution n2 is not available, Piinkor needs to be estimated 
empirically from the n profile. Each nucleosome positioning data set consists of the histogram of the 
number of nucleosomes starting at each genomic bp i. We preprocess these data by removing all counts 
of height 1 from the histogram and smoothing the remaining counts with a cr = 2 Gaussian kernel. 
Next, we compute n[i) by rescaling the smoothed profile so that the maximum occupancy for each 
chromosome is 1. Finally, we identify all local maxima on the n profile and assume that they mark 
prevalent nucleosome positions. Specifically, for each maximum at bp i we find subsequent maxima at 
positions i + 146 < ji < J2 < js < ■ ■ • in the 50 bp window. To each pair of maxima (i, j2), • ■ ■ 

we assign the probability that they represent neighboring nucleosomes: n(«)n(ji), n{i)[l — n{ji)]n{j2), 
and so on. We sum the probabilities over all initial positions i and normalize, producing an empirical 
estimate of Punkcr ■ 
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2.3 Comparison between lattice and continuous one-dimensional fluids 



The formalism presented in the previous subsection can be used to obtain the equation of state and 
chemical potential for any generic interaction (l>{x). Let us consider two simple cases: the ideal gas and 
the Tonks lattice gas, which is characterized by the hard-core interaction 



oo if X < a, 
if X > a. 



For the ideal gas, the z transform of e and the saddle point zq [obtained from Eq. (13)] are 
given by: 



F{z) 



z 

L + N 
N 



Using these expressions, we can compute the logarithm of the partition function [Eq. (12|] 



InQjv(i) =Lln 



L + N 
L 



Nln 



L + N 
N 



O(lniV), 



which gives the pressure and the chemical potential for the ideal lattice gas: 



N 



I3fi\'^ = In 



N 
L + N 



(18) 
(19) 



In the case of the Tonks lattice gas. 



F{z) 



L- Na + N 



L-Na 

The pressure and the chemical potential arc then given by: 

N 



P^J = a In 



L-Na 
L-Na + N 
L-Na 



In 



N 

L-Na + N 



(20) 
(21) 



It is useful to compare these results with the corresponding results for continuous one-dimensional 
gases. Denoting the physical length of the particles by A, the length of the box by £, and the Laplace 
transform of e^*^* by 



we obtain the canonical partition function as the inverse Laplace transform 



1 



1 



[\{T)\ ' 



N 



where A(T) = h/ \/2TrmkBT is the thermal de Broglie wavelength, and sq is the saddle point given by 
the equation 

jr + N^^O. 



10 



Using the previous two equations, we find for tlie ideal gas [(p{s) — ^, and so = ■ 

id cflnQcont N 



Similarly, for the Tonks gas we obtain 



dC 
d In Qc 



ON 



In 



NXjT) 

c ■ 



ip{s) = 
N 

C~NA' 

NA 
C-NA 



In 



and So 



NX{T) 
C-NA 



N 



C-NA 



(22) 
(23) 

(24) 
(25) 



To compare continuous and discrete results, we let the lattice constant b approach 0, while keeping 
the particle size {A — ah) and the box size (£ = Lb) finite. We obtain: 



lim Pp'{^ = lim ^ In ( 1 



1 



fa^O b 
1 



lim /3pi = lim - In 1 



b->-0 



b-i-O b 



Nb\ 

^) 
Nb 



C-NA 



Similarly, the chemical potentials for the ideal and Tonks lattice gases become asymptotically, as 
6^0: 



(3nf - In 



Nb 
NA 



C-NA 



In 



Nb 



C-NA 



These expressions are identical to the chemical potentials of the corresponding continuous gases 
[Eqs. (23) and (25)], with the length scale A(T) replaced by the typical length scale of the lattice, 
b. 



2.4 Sequence-specific energy of nucleosome formation 



We can extract a sequence-specific component of the one-body energy by using Eqs. (|8| or ( 10 ) to 
compute u — fj,, estimating the chemical potential /z, and fitting the one-body energy u to a linear 
model which assigns energies to nucleotide words found within the a — 147 bp nucleosomal site. 



Assuming that the system is nearly homogeneous, we use Eqs. (14) or (21) to obtain the chemical 
potential of the lattice gas. After eliminating fj,, we fit a linear model to one-body energies u. It 
was established in Ref. [22] that position-independent models in which the energy of the nucleotide 
word does not depend on its exact location within the nucleosome can be used to describe genome- 
wide nucleosome occupancies. Furthermore, an = 2 position-independent model with just 13 fitting 
parameters performed as well as A^ > 2 models [here, N denotes the longest word (in bp) included 
into the model]. 

If both monomers and dimers contribute to the total one-body energy, the sequence-specific binding 
energy of a 147 bp-long nucleosomal site is given by 

= '^maea + '^mai3eap + eo, (26) 

where TOq, is the number of nucleotides of type a € {A,C,G,T}, is the energy of the nucleotide 
a, and eg is the overall sequence-independent offset. Similarly, map is the number of dinucleotides 
of type a/3, and is the corresponding energy. In Ref. [12], word energies were constrained by 
= ^ap = ^ajS = 0, yielding a 13-parameter model. Here we develop an alternative 
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approach which does not impose any additional constraints beyond those caused by the fact that the 
number of mono- and dinucleotides in the 147 bp-long site is fixed. 
We can express the nucleosome energies as u = Mx, or equivalently 



"12,1 



m2,20 1 



max 



\^21 



where u^{i) is the sequence-specific energy of the nucleosome that covers the DNA sequence between 
bps i and i + a—1. rrii^i, . . . , mi^4 give the number of A, C, G and T nucleotides found in that sequence, 
and mi. 5, . . . , mi.20 give the number of dinucleotides AA, AC, . . . , TG, TT. Imax = i — a -I- 1 is the 
maximum starting position for a nucleosome, and the set of parameters xi, . . .X21 represents the 21 
energies from Eq. (l26|: Xi = e^, X2 = e^, ■ • ■, 2:20 = ett, 2^21 = eo- 



Note that for any DNA sequence of length 147 bp, 

rrit^i + TOi,2 + m^.s + ™i,4 = 147 TOi^2i, 
rrii.s + rrii^e + .. . + mi,20 = 146 mi^2i- 

This means that the rank of M is 19. For any linear operator (M), the dimension of its domain (21 in 
our case) is equal to the sum of the dimensions of the image im(M) in the energy space (19) and the 
kernel ker(M) in the parameter space (2). 
It is easy to check that the vectors 
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belong to ker(M), so that ker(M) = span(x*, x**). 

Any vector of parameters x can be uniquely decomposed as x = xk + x-*-, with x.k G ker(M) 
and X"*^ g ker(M)-'-, the subspace which is orthogonal to ker(M). Specifically, x-'- = x — (x*,x)x* — 
(x**,x)x**. The component xk does not contribute to the energy of the sequence because Mx^f is 
the null vector. The components of x-*- satisfy the following relations: 



(x^,x*) = 0^^a;^-147 x^i = 0, 
1=1 
20 

(x-L,x**) = ^ ^.T^ - 146 = 0. 



1=5 



Thus x^ has only 19 independent parameters, and 
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In order to compare two different sets of 21 energies (e.g. fit on different genomes), we need to 
eliminate the components of these two vectors included in ker(M). The components from ker(M)^ 
will have 19 independent parameters and 2 redundant ones [Eq. (27)]. The projection of the energy 



vector on the im(M) hyperplane is unique, and there is a one-to-one correspondence between im(M) 
and the parameter subspace which is orthogonal to the kernel, ker(M)-'-. In this way, every set of fitted 
energies uniquely determines a set of parameters (x-*-) and a sequence-specific energy (u = Mx^'-). 
For the N = 1 model, ker(M) is spanned by a single vector 

/ 1 \ 

1 

1 
1 

\-147/ 

and x^ has 4 relevant parameters and a redundant one 
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Similarly, the TV = 3 model has 85 (4 + 16 + 64 + 1) fitting parameters and there are six independent 
constraints on the columns of M, so that the rank of the operator M is 79. The kernel of M is spanned 
by six vectors, and the parameter subspace orthogonal to the kernel (which gives the sequence energy) 
is 79-dimensional. For the N — 4 and the N — 5 models, the total number of parameters is 341 and 
1365, and the number of independent parameters is 319 and 1279, respectively. 

When the N = 2, 21-parameter model described above is trained on the energies predicted by 
applying Eq. ( 10 1 to a large-scale map of nucleosomes reconstituted in vitro on yeast genomic DNA 



[T3], it captures the same sequence determinants as our previously used 13-parameter model which 
employs additional constraints [22] (r = 0.9967 between the two sequence-specific energy profiles). 
However, the two approaches are not equivalent, since the 21-parameter model utilizes the maximum 
possible number of independent fitting parameters. 



3 Results 

3.1 Reconstructing nucleosome energetics in a model system 

In the absence of nearest-neighbor interactions induced by chromatin structure, nucleosome formation 
in vitro is fully controlled by DNA sequence and steric exclusion. In this case efficient procedures 
are available for reconstructing nucleosome positions from formation energies [51l36| and for inferring 



nucleosome energetics from experimentally available probability and occupancy profiles [Eq. ( 10 )] 
However, this simple approach may lead to errors if the two-body interactions are in fact present in 
the system. Furthermore, many factors other than DNA sequence can affect nucleosome positioning in 
vivo, including chromatin remodeling enzymes, non-histone DNA-binding factors, and components of 
transcriptional machinery P^IT^I^U] . These influences are expected to create potential barriers which 
prevent nucleosomes from forming in certain regions, and potential wells which localize nucleosomes 
through favorable contacts between histones and other proteins. These effects will be lost if a purely 
sequence-specific model is fit to the nucleosome positioning data. 
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We use a simple model system to illustrate the errors caused by neglecting higher-order chromatin 
structure and in vivo potentials (Fig. |4]). We have generated a random 10** bp DNA fragment and 
computed one-body sequence-dependent energies using the 21-parameter N — 2 position-independent 
model (section |2.4[ ) . The sequence-specific word energies for the model were randomly sampled from a 
[—0.02 ksT, 0.02 ksT] uniform distribution. Fig. Wa) shows sequence-dependent nucleosome energies 
in a representative 500 bp window (blue solid line). The window also includes one of the 3 fc^T wells 
placed every 2000 bp throughout the sequence to model in vivo effects; the total one-body energy is 
shown as a green dash-dot-dot line. 

The total energy can be used together with the two-body interaction shown in Fig. |4](b) (blue solid 
line) to construct the exact one-body density profile n(i) and the corresponding nucleosome occupancy 
for the DNA segment [Fig. |4|^c), green dash-dot-dot line]. If we now use Eq. (10 1 which neglects two- 
body interactions to reconstruct one-body energies from n(i), the predicted energy profile captures the 
potential wells and the sequence-specific component but also displays spurious 10 bp oscillations caused 
by the "leakage" of the two-body potential ^ into one-body energetics [Fig. |4][a), red dashed hne]. In 
addition, the whole landscape is shifted downward because favorable two-body interactions are missing 
from the model. The "leaked" oscillations and the in vivo wells have no relation to sequence and can 
be removed by fitting either the 13- or the 21-parameter model to the prediction [Fig.jljja), light blue 
dash-dot line]. The two predicted energy profiles are highly correlated with each other (r = 0.9993) and 
with the exact profile (r = 0.9913 for the 13-parameter model, r = 0.9915 for the 21-parameter model), 
indicating that the sequence-specific component can be extracted even if the two-body interactions are 
not handled correctly. 

Predicting occupancies from the energy profiles constructed under the (p — assumption causes 
discrepancies with the exact result, shown in Fig. |4|^c) as a green dash-dot-dot line. For example, using 
the one-body energies predicted with Eq. (10) [Fig. Hi a), red dashed line] and the two-body potential 
predicted with Eq. ([Tt]) [Fig. |4][b), green dashed line] gives an occupancy profile with higher average 
occupancy, sharp peaks, and enhanced 10 bp oscillations compared to the exact landscape [Fig. [4](c), 
red dashed line]. This is not unexpected because the two-body potential is both imprinted in the 
one-body profile and included explicitly. In contrast, if <^ is neglected at this stage as well, the exact 
occupancy can be restored from u° [Eq. ([To]) and its inverse do not entail any information loss], but 
the origin of various contributions remains unclear as they are all lumped into the one-body landscape. 

When the 21-parameter model is fit to the = profile [Fig. Qa), light blue dash-dot line] 
and combined with the predicted <P [Fig. |4|^b) , green dashed line] , the occupancy is off since in vivo 
potential wells cannot be captured by this model [Fig. ^c) , light blue dash-dot line] . Nucleosomes are 
not strongly localized if the in vivo wells and barriers are absent [Fig. |4]Jc), blue solid line], consistent 
with the relatively smooth in vitro occupancy profiles |12 p i3 ) . Note that the two occupancy profiles 
will coincide if the mean of the predicted one-body energies is set to the correct value, eliminating the 
spurious offset caused by <I> [Fig.Qa), cf. blue solid and light blue dash-dot lines]. 

In order to reconstruct the occupancy correctly and avoid mixing one-body and two-body contribu- 
tions, we need to turn to the full theory developed in section 2.1 Inserting predicted [Fig.|4][b), green 
dashed line] into Eq. ^ and using the exact n{i) profile, we obtain a system of nonlinear equations 
which can be solved numerically, yielding energies that are very close to the exact result [Fig. |4]^a), 
maroon dotted line]. These energies and the predicted ^ can be used to reconstruct the occupancy 
profile which is nearly exact [Fig.[4jc), cf. green dash-dot-dot and maroon dotted lines]. Thus wc have 
succeeded in separating one- and two-body effects, and in splitting off the sequence-dependent part in 
the former. However, the full procedure is computationally intensive and becomes inefficient if the DNA 
is longer than 10^ bp (longer segments may be split into manageable pieces and handled separately). 



3.2 Nucleosome localization by potential barriers and wells 

Nucleosomes in the vicinity of potential barriers and wells can be localized by steric exclusion alone 
|17j . This mechanism is thought to contribute to prominent nucleosome occupancy peaks in genie 
regions observed in vivo but not in vitro |12lll3irHlll51ll61l21j . In order to understand the nature and 
the extent of in vivo nucleosome localization, we need to study nucleosome occupancy patterns created 
by placing a single potential barrier or potential well onto an otherwise flat one-body energy landscape. 

In Fig.[5]we show nucleosome occupancy induced by a symmetric Gaussian barrier, with and without 
two-body interactions. As the chemical potential is changed to increase the average occupancy, the 
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Fig. 4 ( Color online) (a) One-body energies. Sequence-dependent energy given by the 21-parameter model 
[Eq. (26l] (blue solid line), total energy given by the sum of the sequence-specific energies and 5 potential 
wells with 3 fcsT depth at 1,3,5,7, and 9 xlO^ bp designed to mimic the in vivo effects (green dash-dot- 
dot line). Energy predicted with a model that neglects two-body interaction s [E g. ( |10[ )] (red dashed line), 
energy predicted by fitting the 21-parameter model to the energies from Eq. I\W\ (light blue dash-dot line), 
a numerical solution of the full model which takes $ into account (maroon dotted line). Inset: zoom-in on 
the r egio n with the potential well, (b) Exact two-body interaction <P (blue solid line) and predicted interaction 
[Eq. ( 17 1] (green dashed line), (c) Nucleosome occupancies. Occupancy generated by the exact sequence-specific 
one-body energy and the exact interaction (blue solid line), occupancy corresponding to the combined exact 
one-body energy (sequence-specific component and potential wells) and the exac t interaction (green dash-dot- 
dot line). Predicted occupancy generated by the one-body energy from Eq. (10 1 and predicted $ (red dashed 
line), occupancy generated using predicted sequence-dependent one-body energy [Eq. (26l] and predicted $ 
(light blue dash-dot line), occupancy predicted using numerically computed one-body energies from the full 
model and predicted $ (maroon dotted line). 



oscillations become more prominent. Without two-body interactions, the peak closest to the barrier is 
always the highest and the occupancy pattern is a decaying oscillation [Fig.|5j[a)]. Strikingly, including 
<P results in a markedly different occupancy profile: oscillations are more persistent and the peak closest 
to the barrier is not always the highest [Fig. [5jb)]. 

The degree of nucleosome localization is also controlled by the width of the Gaussian barrier: wider 
barriers induce less prominent oscillations, but produce stronger occupancy depletion over the barrier 
itselfjFigs. [sjjc) andjsjd)]. The degree of depletion is also controlled by the barrier height [Figs, [sjje) 
and pi f)]. Interestingly, increasing the strength of two-body interactions results in a higher average 
occupancy and produces shorter peak-to-peak distances [Fig.jsj^g)]. In fact, the peak-to-peak distances 
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Fig. 5 (Color online) Symmetric Gaussian barrier. Occupancy profiles for the following scenarios: variable 
chemical potential fi (a) and (b), variable barrier width (c) and (d), and variable barrier height (e) and (f). 
Unless otherwise specified in the legend, the barrier heights are 5 ksT, cr = 30 bp, and {u — ^} = 5 fesT [in panel 
(c) (w-/x) = 1 ksT]. Panels (b), (d) and (f) have a two-body interaction ${A) = A cos (27rZi/10) exp(-Z\/50), 
with A = 5 ksT. (g) Occupancy profiles for variable interaction strength A. (h) Variation of the typical distance 
between neighboring nucleosomes as /i or yl is varied. Upper panel: = 0, lower panel: (it) — /i = 5 fcsT. 



(which can be interpreted as the sum of the 147 bp nucleosomal site and a linker) can be varied in a 
wide range by changing either the chemical potential /i or the strength of <!> [Fig. plh)]. 

Similar conclusions can be reached if the Gaussian barrier is replaced by a symmetric Gaussian 
potential well (Fig. [6|: oscillations decay less rapidly with two-body interactions, and the extent of 
oscillations is controlled by the chemical potential and by the depth and the width of the well. However, 
in this case the nucleosome closest to the well is always the most localized. Nucleosome occupancy 
in the vicinity of 5' NDRs is prominently asymmetric |131I16| . This asymmetry can be modeled by a 
combination of a symmetric barrier with an adjacent potential well [57j, or by a single asymmetric 
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Fig. 6 (Color online) Symmetric Gaussian well. Occupancy profiles for the scenarios described in Fig. I All 
the parameters not explicitly given in the legends are from Fig. [5] In particular, well depths have the same 
magnitude as the heights of the corresponding barriers. 



barrier. In Fig. [7] we show how nucleosonie localization and the degree of asymmetry in the occupancy 
profile vary with the chemical potential, the strength of the two-body interaction, the height of the 
barrier, and the degree of its asymmetry. 

In summary, two-body interactions significantly modify nucleosome occupancy profiles, affecting 
heights and spacings of observed nucleosome localization peaks. However, as the interaction itself only 
couples neighboring nucleosomes but does not determine their absolute positions, a potential barrier or 
well is required to achieve localization in the first place. Increasing the width of this feature diminishes 
its localization capacity. The interaction favors configurations with linker lengths corresponding to the 
minima of ^, leading to linker length discretization '^^flEfTP' . By changing the interaction strength 
or the chemical potential one can create occupancy patterns with different average linker lengths. 
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Fig. 7 (Color online) Asymmetric Gaussian barrier. Occupancy profiles for the scenarios described in Fig. [s] 
Unless otherwise specified in the legend, the barrier heights are 5 fcsT, ctl = 70 bp, or = 30 bp, and {u — ji) — 
5 fesT [in panel (c) {u - n) = 1 ksT]. 



3.3 Modeling nucleosome occupancy over transcribed regions 

The characteristic patterns of nucleosome occupancy in the region between 5' and 3' NDRs are shown 
in Fig. [8 As discussed in the Introduction, there is a pronounced lack of nucleosome localization in 
vitro [T2l[T^ [Figs.[8|^a) andjsf^b)]. The 21-parameter N — 2 position-independent model captures this 



liquid-like behavior correctly but is unable to account for the in vivo peaks. Since DNA sequence alone 
clearly cannot produce the observed degree of in vivo localization, we sought to construct a minimal 
model in which potential barriers of non-sequence origin flank each gene and the one-body energy 
landscape is flat otherwise [571155] (Fig. [o]) . 

In the Kaplan et al. dataset [H], the first nucleosome is in fact the most localized and the average 
profile is consistent with the absence of two-body interactions [Fig. Isla)]. In contrast, Zawadzki et 
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Fig. 8 (Color online) Average nucleosome occupancy in the vicinity of transcription start and termination 
sites (TSS and TTS, respectively). Each occupancy profile is normalized by its average in the [—500, 500] bp 
window, (a), (b): Nucle osom e occupancy observed in vivo (YPD medium) and tn vitro by Kaplan et al. [12] and 
in vitro by Zhang et al. [13], and predicted using a 21-parameter N = 2 position-independent model, a minimal 
model in which nucleosomes are localized purely by means of sequence-independent potential barriers (Fig.|9|, 
and a combined model in which sequence-specific energies from the 21-parameter jV = 2 model are added to 
the barriers from Fig. [9] The two-body potential is turned off. Note that i n Re f. [13| DNA was mixed with 
histones in a 1:1 mass ratio which is close to the in vivo value, while in Ref. [12] the r atio was 0.4:1, resulting 
in d eeper NDRs. (c), (d): Nucleosome occupancy observed m vivo by Zawadzki et al. [21] and Mavrich et al. 
|16| . and predicted using the 21-parameter N — 2 position-independent model, the minimal model, and the 
combined model. The two-body potential is given by ^(zi) — Acos (27rZ\/10) exp(— zi/50), with A = 5 ksT. 



al. |21l and Mavrich et al. |16j profiles appear to be shaped by the higher-order chromatin structure 
[Fig. p[c)]. This experimental discrepancy may have resulted from under-digesting chromatin with 
micrococcal nuclease (MNase), an enzyme typically used to liberate mononucleosome cores [31]. In 
addition, the number of active genes that presumably reside in more open, active chromatin charac- 
terized by weaker two-body interactions could vary between experiments. However, in all three cases 
the in vivo barriers are necessary to reproduce observed localization patterns. The 5' NDR is strongly 
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Fig. 9 (Color online) The one-body energy profiles used in Figs, [s] and 
fioft ~ 80 bp and (Tright = 30 bp. The 3' symmetric barrier has cr = 80 
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asymmetric [Figs, [sjja) andjsjc)] and thus needs to be modeled either with a combination of a sym- 
metric barrier and a potential well for the -1-1 nucleosome |27| . or with a single asymmetric barrier 
(Fig.[7|. The height of each barrier in Fig.[9]is adjusted to reproduce the extent of observed nucleosome 
depletion. 

The average occupancy profiles are not significantly altered if sequence-specific energies from the 
21-parameter N — 2 position-independent model are added to the barriers from Fig. [o] (Fig. |8] cf. 
combined and minimal models). The N = 2 model yields a standard deviation of 0.61 fcsT for the 
energies genome- wide, consistent with the assumption that sequence-dependent energies should be less 
than 1 ksT, and thus the one-body landscape is still dominated by the barriers. Note that barrier 
heights are reduced in the combined model because sequence-dependent nucleosome depletion over 
NDRs is now included explicitly. 

The difference between the minimal and the combined models is more pronounced if individual 



occupancy profiles are displayed as a heat map (Fig. 10). Minimal model barriers adjacent to each 
other on the genomic sequence {e.g. the 5' barriers of two divergent genes sharing a single promoter) 
sometimes create anomalous NDRs with the extent of nucleosome depletion not observed in the data 
[Figs. [lO|e) and 10 'f)]. Interestingly, these effects are reduced when sequence specificity is combined 
with the minimal model [Figs. 



bined models (Fig. [9|, we cone 
of the barriers. 



W g) andnOTh)]. Comparing barrier heights in the minimal and com- 



ude that intrinsic histone-DNA interactions are responsible for < 30% 



4 Discussion and Conclusion 

We have developed a theory of nucleosomes positioned by both intrinsic histone-DNA interactions and 
a two-body, nearest-neighbor potential induced by higher-order chromatin structure. Our theory is 
general and can be applied to any system of finite-size particles in an external field of arbitrary strength, 
interacting via a short-range two-body potential. From one- and two-body nucleosome energies we can 
reconstruct one- and two-particle distribution functions exactly. Conversely, one- and two-particle 
distributions available from experiment can be used to predict both intrinsic histone-DNA interactions 
and the two-body potential with high accuracy, even in strongly inhomogeneous systems dominated 
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Fig. 10 (Color online) Heat maps of nuc leosome occupancy around TSS and TTS for 5747 S. cerevisiae genes. 
In VIVO nucleosomes (YPD medium) [12] (a) and (b), N — 2 position-independent model (c) and (d), minimal 
model (e) and (f), combined model (g) and (h). The minimal model is constructed by placing potential barriers 
from Fig. [9] at the end of each gene onto an otherwise flat one-body energy landscape without two-body 
interactions. The combined model is constructed by adding sequence-specific energies from the 21-parameter 
N = 2 position-independent model (which have standard deviation of 0.61 fcsT genome-wide) to the minimal 
model. The occupancy for each gene is normalized by the average occupancy in the [—500, 500] bp window. The 
experimental data [(a) and (b)] are smoothed with a 2D Gaussian kernel {ax = 1 bp and ay = 2 genes). The 
genes are sorted in each panel in the order of increasing variance of the occupancy. The genome-wide average 
occupancies are 0.1508 [(a) and (b)], 0.2024 [(c) and (d)], 0.7516 [(e) and (f)], and 0.7232 [(g) and (h)]. 



by one-body forces. Finally, we infer sequence determinants of nucleosome positions through a linear 
fit of predicted one-body energies to a model in which each mono- and dinucleotide is assigned the 
same energy regardless of its position within the nucleosomal site (the N — 2 position-independent 
model). In contrast to our previous approach which employed a hierarchy of energy constraints |22| . 
here we chose to fit all available parameters. The fitted parameters are subsequently projected onto 
the relevant subspace, creating a non-degenerate description of nucleosome energetics. 

Unfortunately, high-throughput maps of nucleosome positions typically yield just one-particle dis- 
tributions - two-body information is lost because the positional data is averaged over many cells. 
Nonetheless, the two-body potential can be inferred from one-body density profile n(i) if <!> is suffi- 
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ciently strong to create 10—11 bp periodic occupancy oscillations in the vicinity of potential wells and 
barriers which tend to localize nucleosomes (Fig. l4|. These oscillations can then be used to predict 



^linker, the distribution of linker lengths, which gives <!> via Eq. (17 1. This empirical approach fails in 
the absence of one-body energies because arrays of nucleosomes (even subject to two-body forces) can 
freely move along the DNA without any energy cost, resulting in a constant n{i) from which Punkor 
cannot be deduced. 

We have previously shown that our approach is capable of separating the 10—11 bp periodic 
two-body potential from the one-body contribution which is responsible for rotational nucleosome 
positioning and thus has the same periodicity of DNA helical twist [27] . Furthermore, we have inferred 
two-body interactions from genome-wide maps of nucleosomes assembled in vitro on genomic DNA, 
and have demonstrated their essential role in explaining the observed autocorrelation of genome-wide 
nucleosome positions, and in shaping the occupancy profile over transcribed regions. 

However, the full analysis, which involves finding u{i) from a system of nonlinear equations obtained 
by inserting ^ from Eq. (|17|) into Eq. ([3|), is computationally expensive. A simpler approach which 



neglects two-body interactions [Eq. ( 10 )]Ts much faster but leads to "leakage" of two-body effects into 
predicted one-body potentials if <!> is in fact present in the system (Fig. [4|. Nonetheless, if a sequence- 
specific model is fit to this one-body profile, the correct sequence-dependent energy is obtained because 
oscillations caused by are not related to any sequence features. Thus if one only needs to determine 
sequence determinants of nucleosome positioning, the two-body potential can be neglected altogether 
and a simpler, computationally efficient approach can be used. Nevertheless, only the full theory 
presented here is capable of disentangling one- and two-body effects. 

In vivo, nucleosomes adjacent to 5' and 3' NDRs are positioned by steric exclusion which creates 
prominent localized peaks of nucleosome occupancy in the vicinity of each depleted region [T2l[T4l[T5l 
[TB] . This localization is absent in vitro P^ITB] , indicating that intrinsic histone-DNA interactions can 
only partially account for in vivo occupancy patterns, creating barriers with heights less than 1 ksT. 
Therefore, sequence-specific models need to be combined in vivo with additional potentials established 
by RNA polymerase, ATP-dependent chromatin remodeling enzymes, and DNA-binding proteins. 

To this end, we have studied nucleosome positioning by potential wells and barriers, showing in 
particular that the distance between neighboring peaks in the nucleosome occupancy profile depends 
on both histone octamer concentration and the strength of two-body interactions (Figs. [5] [6] and [?]). 
We then created a genome-wide model of nucleosome occupancy by flanking each transcribed region 
with potential barriers whose height, width and the degree of asymmetry were adjusted to reproduce 
the observed occupancy patterns (Fig. [9]). This minimal, sequence-independent model successfully 
captures the main features of in vivo occupancy, although, surprisingly, the role of ^ seems to vary 
between experiments (Fig.jsj). When the minimal model is combined with the sequence-specific energies 
(with magnitudes < 1 fc^T as discussed above), occupancy patterns of a subset of genes get closer to 



experiment (Fig. 10), but the effect of DNA sequence on the average occupancy profile is rather small 



(Fig.[8f. 

In summary, we have presented a rigorous theory of nucleosomes subject to nearest-neighbor two- 
body interactions, demonstrating the essential role of chromatin structure in genome-wide models 
of nucleosome occupancy. Future experiments focused on multi-nucleosome distributions will provide 
data for our exact theory [Eqs. Q and (|9|], obviating the need for the approximate Eq. (17 1. Further 



experimental and computational work should also be aimed at pinpointing the origin of in vivo potential 
barriers in various organisms and cell types. 
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